DOCUMENT RESUME 

ED 069 688 TM 002 139 

AUTHOR Veldman, Donald J. ; McNemar, Quinn 

TITLE In Defense of the Chi-Square Continuity 

Correction. 

SPCNS AGENCY Office of Education (DHEW) , Washington, D.C. 

CONTRACT OEC-6-1 0-108 

NOTE 4p. ; Presented at the American Psychological 

Association . A : - 

EDRS PRICE MF-$0 .65 HC-J3.29 i 

DESCRIPTORS *Goodness of Fit; *Mea sureroent Techniques; *Research 

Methodology; Speeches; ^Standard Error of 
Measurement; *Statistical Analysis ; Technical 
Reports; Test Bias 



ABSTRACT 

Published studies of the sampling distribution of 
chi-square with and without Yates* correction for continuity have 
been interpreted as discrediting the correction. Yates' correction 
actually produces a biased chi-square value which in turn yields a 
better estimate of the exact probability of the discrete event 
, concerned when used in conjunction with the usual tables of 
'significant chi-square values for one degree of freedom. Data from a 
I computer simulation demonstrate the validity and importance of using 
the continuity correction for chi-square with one degree of freedom. 
(Author) 
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IN DEFENSE OF THE CHI-SQUARE CONTINUITY CORRECTION 1 

Donald J. Veldman and Quinn McNemar 
The University of Texas at Austin 



l : ; 

Empirical studies of\the sampling distributions of parameters 
such as t, F> £, and can be ‘helpful to the researcher who is concerned 
about the dangers of breaking the assumptions of the tests he employs. 
Outstanding examples are the works of Morton (Lindquist, 1953) , Box (1953), 
and Bonneau (1962), which demonstrated the "robustness" of the F_ distribu- 
tion when 'the assumptions of normal ity and homogeneity are broken. 
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Occasionally, however, simulation (monte carlo) studies have 
been reported i n which empirical data — although sound in themselves -- 
have been misinterpreted, in that the wrong .quest i on was addressed. A 
study by Grizzle (1967) is i 1 1 ustrat i ve of th i s type of error. In the 
present paper we will describe a replication of Grizzle's findings and then 
another simulation study which makes clear the validity of Yates' (1934) 
correction for chi-square with one degree of freedom. 

Average Chi-Square Values 

A computer program (Veldman, 1969) v/as written to compute 10,000 
chi-square tests from randomly-derived frequency data using a population 
proportion of 0.5 for the dichotomy. Each sample had N = 40. Chi-square 
values were computed with and without the continuity correction. The 
average chi-square value without correction was 1.00, as expected, but 
the average of the corrected chi-square values was only 0.77. Even more 
striking is the fact that when corrected chi-squares were used, the numbers 
of chi-square values exceeding the tabled significance levels were far 
(ewer than expected. 



Discrete Events and Continuous Distributions 

The flaw in the previous study is not in the way the empirical 
data were derived; it is in the conceptualization of the problem itself. 

The purpose of the continuity correction is not to provide a more accurate 
estimate of the continuous chi-square distribution when discrete (frequency) 
data are employed. Although the need for the correction does arise from 
the fact that discrete events are not well-fitted by continuous distributions 
under some conditions, Yates' correction actually yields a biased estimate 
of chi-square, which results in a more accurate estimate of the exact 
probabi I i ty of the event concerned. 
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The ultimate criterion, thus, is the exact probability of the 
discrete observed event, which can be calculated for the example problem 
by means of formula CO, which yields a two-tailed P value. 
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[1] P = K! ( N ! — K ) 

where N = the sample size 

and K = the observed frequency for one of the two cells. 



Exact Probabi I ities and Chi-Square Values 

To demonstrate the validity of the continuity correction, 10,000 
random samples (N = 100) were generated by a computer program from a 
dichotomous population with P = 0.5. For each sample outcome an exact 
probab i I i ty va lue was calculated with formula C 1 J . Ch i -square va I ues were 
then calculated with and wi thout the conti nu i ty' correct i on , and these were 
converted to probabi I i ty estimates by reference to the theoretical chi- 
square distribution, using a computer routine (Veldman, 1967, p, 131^). 

Table 1 contains the frequencies of samples which produced prob- 
abi I ity values exceed i ng levels between 0.01 to 0.10, using each of the 
three methods of der i v i ng probab i I ities. 

Obviously, the continuity-corrected chi-squares yield probabil- 
ities closer to the exact values than do the raw chi-squares, when both 
are referenced against the theoretical chi-square distribution. 

The reason for the curious fact that fewer than the expected 
numbers of samples reach significance may be inferred from consideration 
of the more extreme case of discreteness when N = 10. There are only six 
possible "splits" that can occur, as shown in Table 2. 

At the 5/5 level, only 10-0 and 9-1 splits produce chi-square 
values larger than 3.841, with or without the continuity correction. The 
exact probability of a 1 0—0 split is .002, while that for either a 10-0 or 
a 9-1 split is .0215. The counts will usually be less than the theoreticol 
expectation, particularly when N is small. 

Cone I us i ons 

Yates 5 continuity correction for chi-square with one degree of 
freedom is both valid and necessary. 

Simulation studies concerning statistical theory must be care- 
fully designed to avoid misleading recommendations to research workers. 
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This investigation was supported in part by the Research and 
Deve I opment Center for Teacher Education, United States Office of Education 
Contract 0E6-10-108. 



^he published Fortran routine can be improved for probabilities 
near 0.5 by retaining the signed value of 2 as 22, and inserting the fol- 
lowing statement just before RETURN: IF (22 . LT . 0.0) PRBF = 1.0 - PRBF 
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Table 1. Numbers of samples reach i ng stat i stica I significance 



Method of 
Calculation 








Significance Level 








.01 


.02 


.03 


.04 


.05 


.06 


.07 


.08 


.09 


.10 


Exact Probab i 1 i ty 


76 


142 


249 


385 


385 


610 


610 


610 


932 


932 


Raw x 2 


142 


249 


385 


385 


610 


610 


932 


932 


932 


932 


Corrected x 2 


76 


142 


249 


385 


385 


610 


610 


610 


932 


932 



Table 2. Possible outcomes with N = 10* 

Split Exact P Raw x 2 Corrected y 2 



10-0 


.0020 


10.0 


(.0020) 


8.1 


(.0047) 


9-1 


.0215 


6.4 


(.0111) 


4.9 


(.0252) 


8-2 


. 1094 


3.6 


( .0545) 


2.5 


(.1097) 


7-3 


.3438 


1.6 


(.2031) 


0.9 


(.3448) 


6-4 


.7539 


0.4 


( .5344) 


0.1 


(.7505) 


5-5 


1.0000 


0.0 


(1.0000) 


0.0 


(1.0000) 



*Values in parentheses are probabilities derived 
from the theoretical x 2 distribution. 



